PCA-HPR: A principle component analysis model for human promoter recognition

نویسندگان

  • Xiaomeng Li
  • Jia Zeng
  • Hong Yan
چکیده

We describe a promoter recognition method named PCA-HPR to locate eukaryotic promoter regions and predict transcription start sites (TSSs). We computed codon (3-mer) and pentamer (5-mer) frequencies and created codon and pentamer frequency feature matrices to extract informative and discriminative features for effective classification. Principal component analysis (PCA) is applied to the feature matrices and a subset of principal components (PCs) are selected for classification. Our system uses three neural network classifiers to distinguish promoters versus exons, promoters versus introns, and promoters versus 3' un-translated region (3'UTR). We compared PCA-HPR with three well-known existing promoter prediction systems such as DragonGSF, Eponine and FirstEF. Validation shows that PCA-HPR achieves the best performance with three test sets for all the four predictive systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AnG-HPR: Analysis of n-Gram based human Promoter Recognition

We describe a promoter recognition method named An-HPR to locate eukaryotic promoter regions and predict transcription start sites (TSSs). We computed n-gram features are extracted and used in promoter prediction. We computed n-grams (n=2, 3, 4, 5) as features and created frequency features to extract informative and discriminative features for effective classification. Neural network classifie...

متن کامل

Improving the quality of images synthesized by discrete cosines transform – regression based method using principle component analysis

  Purpose: Different views of an individuals’ image may be required for proper face recognition.   Recently, discrete cosines transform (DCT) based method has been used to synthesize virtual   views of an image using only one frontal image. In this work the performance of two different   algorithms was examined to produce virtual views of one frontal image.   Materials and Methods: Two new meth...

متن کامل

Patterns Prediction of Chemotherapy Sensitivity in Cancer Cell lines Using FTIR Spectrum, Neural Network and Principal Components Analysis

    Drug resistance enables cancer cells to break away from cytotoxic effect of anticancer drugs. Identification of resistant phenotype is very important because it can lead to effective treatment plan. There is an interest in developing classifying models of resistance phenotype based on the multivariate data. We have investigated a vibrational spectroscopic approach in order to characterize a...

متن کامل

Cisplatin Resistant Patterns in Ovarian Cell Line Using FTIR and Principle Component Analysis

Cisplatin is a common chemotherapeutic agent that used for treatment of many solid cancers. Rapid identification of chemotherapy resistance is very important and may lead to effective treatment plan. Spectroscopy techniques, such as infrared spectroscopy, which are sensitive to biochemical composition of samples, have shown potentials to discriminate tissues. Developing in Fourier transform inf...

متن کامل

Patterns Prediction of Chemotherapy Sensitivity in Cancer Cell lines Using FTIR Spectrum, Neural Network and Principal Components Analysis

    Drug resistance enables cancer cells to break away from cytotoxic effect of anticancer drugs. Identification of resistant phenotype is very important because it can lead to effective treatment plan. There is an interest in developing classifying models of resistance phenotype based on the multivariate data. We have investigated a vibrational spectroscopic approach in order to characterize a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformation

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2008